AITopics | on-device model

Collaborating Authors

on-device model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LightAgent: Mobile Agentic Foundation Models

Jiang, Yangqin, Huang, Chao

arXiv.org Artificial IntelligenceOct-28-2025

With the advancement of multimodal large language models (MLLMs), building GUI agent systems has become an increasingly promising direction-especially for mobile platforms, given their rich app ecosystems and intuitive touch interactions. Yet mobile GUI agents face a critical dilemma: truly on-device models (4B or smaller) lack sufficient performance, while capable models (starting from 7B) are either too large for mobile deployment or prohibitively costly (e.g., cloud-only closed-source MLLMs). To resolve this, we propose LightAgent, a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models, while avoiding their drawbacks. Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT->GRPO training on synthetic GUI data for strong decision-making, integrates an efficient long-reasoning mechanism to utilize historical interactions under tight resources, and defaults to on-device execution-only escalating challenging subtasks to the cloud via real-time complexity assessment. Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.22009

Country: Asia > China > Hong Kong (0.04)

Genre:

Workflow (0.68)
Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Apple Intelligence Foundation Language Models: Tech Report 2025

Li, Ethan, Larsen, Anders Boesen Lindbo, Zhang, Chen, Zhou, Xiyou, Qin, Jun, Yap, Dian Ang, Raghavan, Narendran, Chang, Xuankai, Bowler, Margit, Yildiz, Eray, Peebles, John, Coleman, Hannah Gillis, Ronchi, Matteo, Gray, Peter, You, Keen, Spalvieri-Kruse, Anthony, Pang, Ruoming, Li, Reed, Yang, Yuli, Soroush, Emad, Lu, Zhiyun, Xiao, Crystal, Situ, Rong, Huffaker, Jordan, Griffiths, David, Ahmed, Zaid, Zhang, Peng, Parilla, Daniel, Liberman, Asaf, Mallalieu, Jennifer, Mazaheri, Parsa, Chen, Qibin, Bilkhu, Manjot, Zhang, Aonan, Wang, Eric, Nelson, Dave, FitzMaurice, Michael, Voice, Thomas, Liu, Jeremy, Shaffer, Josh, Zhao, Shiwen, Yadla, Prasanth, Rasteh, Farzin, Guo, Pengsheng, Farooq, Arsalan, Snow, Jeremy, Murphy, Stephen, Lei, Tao, Cho, Minsik, Horrell, George, Dodge, Sam, Hislop, Lindsay, Singh, Sumeet, Dombrowski, Alex, Raghavan, Aiswarya, Sirovica, Sasha, Saebi, Mandana, Lao, Faye, Lam, Max, Lu, TJ, Xu, Zhaoyang, Singh, Karanjeet, Kirchner, Marc, Mizrahi, David, Arora, Rajat, Zhang, Haotian, Mason, Henry, Zhou, Lawrence, Hua, Yi, Jain, Ankur, Bai, Felix, Astrauskas, Joseph, Weers, Floris, Gardner, Josh, Chiang, Mira, Zhang, Yi, Agrawal, Pulkit, Sun, Tony, Keunebroek, Quentin, Hopkins, Matthew, Wu, Bugu, Jia, Tao, Chen, Chen, Zhou, Xingyu, Wang, Nanzhu, Liu, Peng, Hou, Ruixuan, Rauch, Rene, Gao, Yuan, Dehghan, Afshin, Janke, Jonathan, Wang, Zirui, Chen, Cha, Ren, Xiaoyi, Nan, Feng, Elman, Josh, Yin, Dong, Goren, Yusuf, Lai, Jeff, Fei, Yiran, Evans, Syd, Yu, Muyang, Yin, Guoli, Qin, Yi, Feldman, Erin, Garg, Isha, Rajamani, Aparna, Vega, Karla, Cheng, Walker, Collins, TJ, Han, Hans, Menacho, Raul Rea, Yeung, Simon, Lee, Sophy, Mutyala, Phani, Cheng, Ying-Chang, Gan, Zhe, Chu, Sprite, Lazarow, Justin, Pappalardo, Alessandro, Scozzafava, Federico, Lu, Jing, Daxberger, Erik, Duchesne, Laurent, Liu, Jen, Güera, David, Ligas, Stefano, Kery, Mary Beth, Ramerth, Brent, Sannino, Ciro, Eichner, Marcin, Huang, Haoshuo, Qian, Rui, Schwarzer-Becker, Moritz, Riazati, David, Gao, Mingfei, Wang, Bailin, Cackler, Jack, Lu, Yang, Niu, Ransen, Dennison, John, Klein, Guillaume, Bigham, Jeffrey, Gopinath, Deepak, Shiee, Navid, Botten, Darren, Tartavel, Guillaume, Garcia, Alex Guillen, Xu, Sam, Haladjian, Victoria MönchJuan, Dou, Zi-Yi, Paulik, Matthias, Mendez, Adolfo Lopez, Li, Zhen, Chen, Hong-You, Jia, Chao, Doshi, Dhaval, Zhang, Zhengdong, Manjani, Raunak, Franklin, Aaron, Ren, Zhile, Chen, David, Peshko, Artsiom, Raghuram, Nandhitha, Hao, Hans, Shan, Jiulong, Nerella, Kavya, Tantawi, Ramsey, Kumar, Vivek, Wang, Saiwen, Wershing, Brycen, Dhingra, Bhuwan, Shah, Dhruti, Adaranijo, Ob, Zheng, Xin, Madsen, Tait, Kotek, Hadas, Liu, Chang, Xia, Yin, Li, Hanli, Jayaram, Suma, Sun, Yanchao, Fakhry, Ahmed, Saveris, Vasileios, Withers, Dustin, Li, Yanghao, Aygar, Alp, Teran, Andres Romero Mier Y, Huang, Kaiwei, Lee, Mark, Li, Xiujun, Li, Yuhong, Johnson, Tyler, Tang, Jay, Cheng, Joseph Yitan, Peng, Futang, Walkingshaw, Andrew, Guibert, Lucas, Sharma, Abhishek, Shen, Cheng, Maj, Piotr, Tanaka, Yasutaka, Jhang, You-Cyuan, Ma, Vivian, Vehvilainen, Tommi, Zou, Kelvin, Nichols, Jeff, Lei, Matthew, Qiu, David, Qian, Yihao, Santhanam, Gokul, Wu, Wentao, Han, Yena, Moritz, Dominik, Fu, Haijing, Xu, Mingze, Rathod, Vivek, Liu, Jian, D'hauwe, Louis, Ba, Qin, Sun, Haitian, Yan, Haoran, Dufter, Philipp, Nguyen, Anh, Feng, Yihao, Wang, Emma, He, Keyu, Nair, Rahul, Shah, Sanskruti, Lu, Jiarui, Sonnenberg, Patrick, Warner, Jeremy, Li, Yuanzhi, Pan, Bowen, Zhong, Ziyi, Zhou, Joe, Davarnia, Sam, Saarikivi, Olli, Belousova, Irina, Burger, Rachel, Wu, Shang-Chen, Feng, Di, Straathof, Bas, Chou, James, Zhang, Yuanyang, Zuliani, Marco, Jimenez, Eduardo, Sundararajan, Abhishek, Du, Xianzhi, Lan, Chang, Shahdadpuri, Nilesh, Grasch, Peter, Sima, Sergiu, Newnham, Josh, Paidi, Varsha, Wang, Jianyu, Haag, Kaelen, Braunstein, Alex, Molinari, Daniele, Wei, Richard, Yang, Brenda, Lusskin, Nicholas, Arreaza-Taylor, Joanna, Cao, Meng, Seidl, Nicholas, Wang, Simon, Hu, Jiaming, Ma, Yiping, Li, Mengyu, Liu, Kieran, Su, Hang, Ravi, Sachin, Wang, Chong, Wang, Xin, Smith, Kevin, You, Haoxuan, Karimzadeh, Binazir, Li, Rui, Lei, Jinhao, Fang, Wei, Doane, Alec, Wiseman, Sam, Fernandez, Ismael, Li, Jane, Hansen, Andrew, Movellan, Javier, Neubauer, Christopher, Zhou, Hanzhi, Chaney, Chris, Kamaldin, Nazir, Wolf, Valentin, Bermúdez-Medina, Fernando, Pelemans, Joris, Fu, Peter, Xing, Howard, Kong, Xiang, Shan, Wayne, Jacoby-Cooper, Gabriel, Shen, Dongcai, Gunter, Tom, Seguin, Guillaume, Shi, Fangping, Li, Shiyu, Xu, Yang, Kamal, Areeba, Masi, Dan, Guha, Saptarshi, Zhu, Qi, Thibodeau, Jenna, Zhang, Changyuan, Callahan, Rebecca, Maalouf, Charles, Tsao, Wilson, Li, Boyue, Cao, Qingqing, Sabo, Naomy, Leong, Cheng, Wang, Yi, Anupama, Anupama Mann, Reed, Colorado, Jung, Kenneth, Chen, Zhifeng, Moorthy, Mohana Prasad Sathya, He, Yifei, Hornberger, Erik, Krishna, Devi, Tong, Senyu, Michael, null, Lee, null, Haldimann, David, Zhao, Yang, Zhang, Bowen, Gao, Chang, Bartels, Chris, Rao, Sushma, Tran, Nathalie, Lehnerer, Simon, Giang, Co, Dong, Patrick, Pan, Junting, Wang, Biyao, Li, Dongxu, Farajtabar, Mehrdad, Hwang, Dongseong, Duanmu, Grace, Verma, Eshan, Reddy, Sujeeth, Shan, Qi, Gao, Hongbin, Du, Nan, Sridhar, Pragnya, Huang, Forrest, Wang, Yingbo, Bhendawade, Nikhil, Zhu, Diane, Aitharaju, Sai, Hohman, Fred, Gardiner, Lauren, Chiu, Chung-Cheng, Yang, Yinfei, Kokmen, Alper, Chu, Frank, Ye, Ke, Elgin, Kaan, Levy, Oron, Park, John, Zhang, Donald, Schoop, Eldon, Wenzel, Nina, Booker, Michael, Kim, Hyunjik, Erdenebileg, Chinguun, Dun, Nan, Yang, Eric Liang, Chhatrapati, Priyal, Mahtani, Vishaal, Gang, Haiming, Chia, Kohen, Seshadri, Deepa, Yu, Donghan, Meng, Yan, Peterson, Kelsey, Yang, Zhen, Wang, Yongqiang, Peng, Carina, Kang, Doug, Agarwal, Anuva, Antony, Albert, Tebar, Juan Lao, Jose, Albin Madappally, Poston, Regan, De Wang, Andy, Casamayor, Gerard, Amirloo, Elmira, Yao, Violet, Kryscinski, Wojciech, Duan, Kun, L, Lezhi

arXiv.org Artificial IntelligenceAug-28-2025

We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transformer that combines track parallelism, mixture-of-experts sparse computation, and interleaved global-local attention to deliver high quality with competitive cost on Apple's Private Cloud Compute platform. Both models are trained on large-scale multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora, and high-quality synthetic data, then further refined with supervised fine-tuning and reinforcement learning on a new asynchronous platform. The resulting models support several additional languages while understanding images and executing tool calls. In public benchmarks and human evaluations, both the server model and the on-device model match or surpass comparably sized open baselines. A new Swift-centric Foundation Models framework exposes guided generation, constrained tool calling, and LoRA adapter fine-tuning, allowing developers to integrate these capabilities with a few lines of code. The latest advancements in Apple Intelligence models are grounded in our Responsible AI approach with safeguards like content filtering and locale-specific evaluation, as well as our commitment to protecting our users' privacy with innovations like Private Cloud Compute.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.13575

Country:

Europe > United Kingdom (0.04)
North America > United States > Colorado (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report (0.51)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks

Lyu, Zhonghao, Xiao, Ming, Xu, Jie, Skoglund, Mikael, Di Renzo, Marco

arXiv.org Artificial IntelligenceMay-15-2025

The growing demand for large artificial intelligence model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications. In particular, edge-device co-inference, which partitions LAIMs between edge devices and servers, has emerged as a promising strategy for resource-efficient LAIM execution in wireless networks. In this paper, we investigate a pruning-aware LAIM co-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment. For analysis, we first prove that the LAIM output distortion is upper bounded by its parameter distortion. Then, we derive a lower bound on parameter distortion via rate-distortion theory, analytically capturing the relationship between pruning ratio and co-inference performance. Next, based on the analytical results, we formulate an LAIM co-inference distortion bound minimization problem by jointly optimizing the pruning ratio, transmit power, and computation frequency under system latency, energy, and available resource constraints. Moreover, we propose an efficient algorithm to tackle the considered highly non-convex problem. Finally, extensive simulations demonstrate the effectiveness of the proposed design. In particular, model parameter distortion is shown to provide a reliable bound on output distortion. Also, the proposed joint pruning ratio and resource management design achieves superior performance in balancing trade-offs among inference performance, system latency, and energy consumption compared with benchmark schemes, such as fully on-device and on-server inference. Moreover, the split point is shown to play a critical role in system performance optimization under heterogeneous and resource-limited edge environments.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.09214

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > France (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models

Huang, Yujin, Zhang, Zhi, Zhao, Qingchuan, Yuan, Xingliang, Chen, Chunyang

arXiv.org Artificial IntelligenceMar-31-2025

On-device deep learning (DL) has rapidly gained adoption in mobile apps, offering the benefits of offline model inference and user privacy preservation over cloud-based approaches. However, it inevitably stores models on user devices, introducing new vulnerabilities, particularly model-stealing attacks and intellectual property infringement. While system-level protections like Trusted Execution Environments (TEEs) provide a robust solution, practical challenges remain in achieving scalable on-device DL model protection, including complexities in supporting third-party models and limited adoption in current mobile solutions. Advancements in TEE-enabled hardware, such as NVIDIA's GPU-based TEEs, may address these obstacles in the future. Currently, watermarking serves as a common defense against model theft but also faces challenges here as many mobile app developers lack corresponding machine learning expertise and the inherent read-only and inference-only nature of on-device DL models prevents third parties like app stores from implementing existing watermarking techniques in post-deployment models. To protect the intellectual property of on-device DL models, in this paper, we propose THEMIS, an automatic tool that lifts the read-only restriction of on-device DL models by reconstructing their writable counterparts and leverages the untrainable nature of on-device DL models to solve watermark parameters and protect the model owner's intellectual property. Extensive experimental results across various datasets and model structures show the superiority of THEMIS in terms of different metrics. Further, an empirical investigation of 403 real-world DL mobile apps from Google Play is performed with a success rate of 81.14%, showing the practicality of THEMIS.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.23748

Country:

Oceania > Australia (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Privacy-Preserving Edge Speech Understanding with Tiny Foundation Models

Benazir, Afsara, Lin, Felix Xiaozhu

arXiv.org Artificial IntelligenceJan-29-2025

Robust speech recognition systems rely on cloud service providers for inference. It needs to ensure that an untrustworthy provider cannot deduce the sensitive content in speech. Sanitization can be done on speech content keeping in mind that it has to avoid compromising transcription accuracy. Realizing the under utilized capabilities of tiny speech foundation models (FMs), for the first time, we propose a novel use: enhancing speech privacy on resource-constrained devices. We introduce XYZ, an edge/cloud privacy preserving speech inference engine that can filter sensitive entities without compromising transcript accuracy. We utilize a timestamp based on-device masking approach that utilizes a token to entity prediction model to filter sensitive entities. Our choice of mask strategically conceals parts of the input and hides sensitive data. The masked input is sent to a trusted cloud service or to a local hub to generate the masked output. The effectiveness of XYZ hinges on how well the entity time segments are masked. Our recovery is a confidence score based approach that chooses the best prediction between cloud and on-device model. We implement XYZ on a 64 bit Raspberry Pi 4B. Experiments show that our solution leads to robust speech recognition without forsaking privacy. XYZ with < 100 MB memory, achieves state-of-the-art (SOTA) speech transcription performance while filtering about 83% of private entities directly on-device. XYZ is 16x smaller in memory and 17x more compute efficient than prior privacy preserving speech frameworks and has a relative reduction in word error rate (WER) by 38.8-77.5% when compared to existing offline transcription services.

artificial intelligence, speech recognition, tiny foundation model, (16 more...)

arXiv.org Artificial Intelligence

2502.01649

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.89)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Stealthy Backdoor Attack to Real-world Models in Android Apps

Wei, Jiali, Fan, Ming, Zhang, Xicheng, Jiao, Wenjing, Wang, Haijun, Liu, Ting

arXiv.org Artificial IntelligenceJan-2-2025

Powered by their superior performance, deep neural networks (DNNs) have found widespread applications across various domains. Many deep learning (DL) models are now embedded in mobile apps, making them more accessible to end users through on-device DL. However, deploying on-device DL to users' smartphones simultaneously introduces several security threats. One primary threat is backdoor attacks. Extensive research has explored backdoor attacks for several years and has proposed numerous attack approaches. However, few studies have investigated backdoor attacks on DL models deployed in the real world, or they have shown obvious deficiencies in effectiveness and stealthiness. In this work, we explore more effective and stealthy backdoor attacks on real-world DL models extracted from mobile apps. Our main justification is that imperceptible and sample-specific backdoor triggers generated by DNN-based steganography can enhance the efficacy of backdoor attacks on real-world models. We first confirm the effectiveness of steganography-based backdoor attacks on four state-of-the-art DNN models. Subsequently, we systematically evaluate and analyze the stealthiness of the attacks to ensure they are difficult to perceive. Finally, we implement the backdoor attacks on real-world models and compare our approach with three baseline methods. We collect 38,387 mobile apps, extract 89 DL models from them, and analyze these models to obtain the prerequisite model information for the attacks. After identifying the target models, our approach achieves an average of 12.50% higher attack success rate than DeepPayload while better maintaining the normal performance of the models. Extensive experimental results demonstrate that our method enables more effective, robust, and stealthy backdoor attacks on real-world models.

backdoor attack, dl model, stealthiness, (15 more...)

arXiv.org Artificial Intelligence

2501.01263

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Spain > Galicia > Madrid (0.04)
(9 more...)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Privacy-Enhanced Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

Wu, Zhiyuan, Sun, Sheng, Wang, Yuwei, Liu, Min, Gao, Bo, He, Tianliu, Wang, Wen

arXiv.org Artificial IntelligenceApr-27-2024

On-device intelligence (ODI) enables artificial intelligence (AI) applications to run on end devices, providing real-time and customized AI inference without relying on remote servers. However, training models for on-device deployment face significant challenges due to the decentralized and privacy-sensitive nature of users' data, along with end-side constraints related to network connectivity, computation efficiency, etc. Existing training paradigms, such as cloud-based training, federated learning, and transfer learning, fail to sufficiently address these practical constraints that are prevalent for devices. To overcome these challenges, we propose Privacy-Enhanced Training-as-a-Service (PTaaS), a novel service computing paradigm that provides privacy-friendly, customized AI model training for end devices. PTaaS outsources the core training process to remote and powerful cloud or edge servers, efficiently developing customized on-device models based on uploaded anonymous queries, enhancing data privacy while reducing the computation load on individual devices. We explore the definition, goals, and design principles of PTaaS, alongside emerging technologies that support the PTaaS paradigm. An architectural scheme for PTaaS is also presented, followed by a series of open problems that set the stage for future research directions in the field of PTaaS.

chinese academy, model training, ptaas, (15 more...)

arXiv.org Artificial Intelligence

2404.10255

Country:

Asia > China > Beijing > Beijing (0.06)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Virginia (0.04)
(3 more...)

Genre:

Instructional Material (1.00)
Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.55)

Add feedback

Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages

Hoscilowicz, Jakub, Pawlowski, Pawel, Skorupa, Marcin, Sowański, Marcin, Janicki, Artur

arXiv.org Artificial IntelligenceApr-3-2024

Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant. In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs) that we fine-tune for machine translation of slot-annotated SLU training data. Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model. Specifically, we saw an improvement in the Overall Accuracy metric: from 53% to 62.18%, compared to the existing state-of-the-art method, Fine and Coarse-grained Multi-Task Learning Framework (FC-MTLF). In the on-device scenario (tiny and not pretrained SLU), our method improved the Overall Accuracy from 5.31% to 22.06% over the baseline Global-Local Contrastive Learning Framework (GL-CLeF) method. Contrary to both FC-MTLF and GL-CLeF, our LLM-based machine translation does not require changes in the production architecture of SLU. Additionally, our pipeline is slot-type independent: it does not require any slot definitions or examples.

dataset, slu model, translation, (16 more...)

arXiv.org Artificial Intelligence

2404.02588

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Investigating White-Box Attacks for On-Device Models

Zhou, Mingyi, Gao, Xiang, Wu, Jing, Liu, Kui, Sun, Hailong, Li, Li

arXiv.org Artificial IntelligenceFeb-8-2024

Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not support gradient computing, which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harmfulness of on-device attacks. To this end, we conduct a study to answer this research question: Can on-device models be directly attacked via white-box strategies? We first systematically analyze the difficulties of transforming the on-device model to its debuggable version, and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Specifically, REOM first transforms compiled on-device models into Open Neural Network Exchange format, then removes the non-debuggable parts, and converts them to the debuggable DL models format that allows attackers to exploit in a white-box setting. Our experimental results show that our approach is effective in achieving automated transformation among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations. In addition, because the ONNX platform has plenty of tools for model format exchanging, the proposed method based on the ONNX platform can be adapted to other model formats. Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models.

on-device model, operator, tflite model, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3597503.3639144

2402.05493

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagrams & Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards an On-device Agent for Text Rewriting

Zhu, Yun, Liu, Yinxiao, Stahlberg, Felix, Kumar, Shankar, Chen, Yu-hui, Luo, Liangchen, Shu, Lei, Liu, Renjie, Chen, Jindong, Meng, Lei

arXiv.org Artificial IntelligenceAug-22-2023

Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a small size with the need to retain the emergent capabilities of the LLM, that requires costly data collection. To address the above challenge, we introduce a new instruction tuning approach for building a mobile-centric text rewriting model. Our strategies enable the generation of high quality training data without any human labeling. In addition, we propose a heuristic reinforcement learning framework which substantially enhances performance without requiring preference data. To further bridge the performance gap with the larger server-side model, we propose an effective approach that combines the mobile rewrite agent with the server model using a cascade. To tailor the text rewriting tasks to mobile scenarios, we introduce MessageRewriteEval, a benchmark that focuses on text rewriting for messages through natural language instructions. Through empirical experiments, we demonstrate that our on-device model surpasses the current state-of-the-art LLMs in text rewriting while maintaining a significantly reduced model size. Notably, we show that our proposed cascading approach improves model performance.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.11807

Country:

Europe > Italy (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback